EXHIBIT A 



Introduction to 
Protein Structure 

Second Edition 



Carl Branden 

' ' . .. 

Microbiology and Tumor Biology Center 

Karolinska Institute 

Stockholm 

Sweden 



John Tooze 

Imperial Cancer Research Fund Laboratories 

UncoSns Inn Fields 

London 

UK 



THE COVER 



Front: The structure of the potassium channel from Streptomyces Uvidaris, 
determined by Rodney MacKinnon at the RockefeUer University, New York. 
As discussed in Chapter 12, this structure— the first of such an Ion channel- 
shows how the channel allows the passage of potassium ions through cell 
membranes with high efficiency and selectivity. The view is looking down 
the protein as it sits in the cell membrane, as seen from outside the cell, with 
a potassium Ion shown in gold. This image was produced using the GRASP 
program (A- Nicholls and B. Honig, Columbia University) from atomic 
coordinates kindly provided by Rodney MacKinnon, 

Back: A hand-drawn image of the potassium channel, in the same view as 
on the front cover, with each subunit of the tetramerlc protein shown in a 
different color. 

Cover design by Christopher Thorpe and Nigel Qrme* 



J^W^^'^v ^ 

0 1991, 1999 Carl Branden arid John Tooze 



All rights reserved. No part olf this book covered by the copyright hereon may 
be reproduced or used in any form or by any mearu— graphic, electronic, or 
mechanical, including photocopying, recording, taping, or information 
storage and retrieval systems— without permission of the publisher. 



Visit the Introduction to Protein Structure Web site; 
http://www T proteinstructurexom/ 

For informatiori on other textbooks available from Garland Publishing, visit: 
http://www.garlandpub.com/ 



Library of Congress Cataloging-in-Publicatiori Data 
t? rand en, Carl-Ivar 

Introduction to protein structure / Carl-Ivar Branden, John Tooze. 
-2nd ed. 
p. cm. 

Includes bibliographical references and index, 
ISBN 0-8153-2304-2 (hardcover). -ISBN 0-8153-2305-0 (pbk.) 
1, Proteins-Structure. I Tooze, John. IL Title. 
QP551.B7635 199ft 

572'.633~dc21 98-34487 

C1P 

Published by Garland Publishing, lnc 

19 Union Square West, New York, NY 10003-3382 

Printed in the United States of America 



15 14 13 12 11 10 9 8 7 6 S 4 3 2 1 



The Building Blocks 



1 



Recombinant DNA techniques have provided tools for the rapid determination 
of DNA sequences and, by inference, the amino acid sequences of proteins 
from structural genes. The number of such sequences is now increasing 
almost exponentially, but by themselves these sequences tell little more 
about the biology of the system than a New York City telephone directory 
tells about the function and marvels of that city. 

The proteins we observe In nature have evolved, through selective pres- 
sure, to perform specific functions- The functional properties of proteins 
depend upon their three-dimensional structures. The three-dimensional 
structure arises because particular sequences of amino acids in polypeptide 
chains fold to generate, from linear chains, compact domains with specific 
three-dimensional structures {Figure 1.1). The folded domains can serve as 
modules for building up large assemblies such as virus particles or muscle 
fibers, or they can provide specific catalytic or binding sites, as found in 
enzymes or proteins that wry oxygen or that regulate the function of DNA. 

To understand the biological function of proteins we would therefore 
like to be able to deduce or predict the three-dimensional structure from the 
amino acid sequence. This we cannot do. In spite of considerable efforts over 
the past 25 years, this folding problem is still unsolved*and remains one of 
the most basic intellectual challenges in molecular biology. 



I Igurv I The amino acid sequence of a 
p roteln's polypeptide chain is called its 
p rimary structure. Different regions of the 
s* quence form local regular secondary 
si noctures, such as alpha (a) helices or beta (J3) 
si rands, The tertiary structure is formed by 
p icking such structural elements into one or 
s< veral compact globular units called domains. 
T.w final protein may contain several 
polypeptide chains arranged in a quaternary 
st ucture. By formation of such tertiary and 
quaternary structure amino acids far apart in 
tt: e sequence are brought close together in 
th tee dimensions to form a functional region, 
ar active site. 



Primary 

— '&— «— 



Secondary 



Tertiary 



Quaternary 





3 



predicted, they must instead be determined «P™;^££2 •gg, 
lography, electron crystallography or nuclear aiagne^ resonjmce (NMR) 
technique,. Over the past 30 years the structures :* f^^Si^ 
have been solved, and the sequences of more than 500,000 have been aerer 
nTeTSis has 8 encrated a body of information from wh.ch a set of has, 
Orioles of protein structure has emerged. These principles make it easier 
E^ndeCnd how pro tein structure is generated, to identify common 
,rn,c ural themes to relate structure to function, and to sec fundamental 

the staee of taxonomy where we can begin to discern patterns ana moms 
amongVe ^ relatively "small number of proteins whose threcKhmenMtmal 

^Stsfrchapters of this book deal with the basic principles of pro- 
tein ™«Ts ^ understand them today, and examples of the differen 
ml"s of protein structures are presented. Chapter 7 contains , a tarf 
Sto on D NA structures with emphasis on recognition by proteins of 

"rTuSeoL sequence, The f<^^J^S!!!J^i 
Jvolution different structural solutions have been selected to fulfill particular 

functions. 

Proteins are polypeptide chains 

All of the 20 amino acids have in common a central carbon atom (C„) to 
which ^ attached a hydrogen atom, an amino VWW*™* 
..rn.m (COOH) (Hatire 1.2a). What distinguishes one amino acid from 
E^tt^TiSi waited to the C s through its fourth valence. There 
Tm different side chair* specified by the genetic code; others occur, m rare 
cases, as the products of enzymatic modifications after translation. 

Amino acids are joined end-to-end during protein synthesis by the for- 
mation of peptide bonds when the carboxyl group of one amino acid con- 
denses with the amino group of the next to eliminate water (F.gure 1.2b). 
This process is repeated as the chain elongates. One consequence 1: that th* 
amino group of the first amino acid of a polypeptide chain and 
eroup oi The last amino acid remain intact, and the chain is said to extend 
from its amino terminus to Its carboxy terminus. The formahon of a succes- 
Jon of peptide bonds generates a "main chain/' or "backbone," from wruch 
project the various side chains. 

P '-me main-chain atom, arc a carbon atom C* to which the Sid * ttam » 
attached, ah Nli group bound to C«, and a carbonyl group CO. who e the 
carbon atom C is attached to C«, These units, or residues, arc imked into a 
polypeptide by a peptide bond between The C atom of one residue and the 
nitrogen .torn of the next (see Figure The baric * n * 
the main chain from a biochemical or genetic viewpoint * thu 
(NH-CaH-C=0), which is the residue of the common parts of amino acta, 
after peptide bonds have been formed (see Figure 1,2b). 

The genetic code specifies 20 different amino acid side chain 
The 20 different side chains that occur In proteins are shown in Panel 1.1 (pp 
6-7). Their names are abbreviated with both a three-letter and a omMette 
code, which are also given in the panel. The one-letter codes ate worth mem . 
orizing, as they are widely used in tire literature. A mnemonic device for link - 
ing the one-letter code to the names of the amino acids is fnven in I anel i.i . 



Figure 1.2 Proteins are built up by amino 
acids that are linked by peptide bonds to form 
a polypeptide chain, (a) Schematic diagram of 
m amino acid, Illustrating the nomenclature 
used in this book. A central carbon atom <C«) 
Is attached to an amino group (NH^), a 
carboxyl group (COOH), a hydrogen atom (HL 
and a side chain (R). (b) In a polypeptide chain 
the carboxyl group of amino acid ti has formed 
a peptide bond, G-N, to the amino group 
of amino acid n + 1- One water molecule is 
eliminated In this process. The repeating units, 
which are called residues, arc divided Into 
main-chain atoms and side chain*. The 
main-chain part, which is identical in all 
residues, contains a central C* atom attached 
to an NH group, a C=C group, and an H atom, 
The side chain R, which is different for 
different residues, is bound to the C p atom. 



side chain 




carboxyl group 



peptide bond 




The amino acids are usually divided into three different classes defined 
by the chemical nature of the side chain. The first class comprises those with 
strictly hydrophobic side chains; Ala (A), Val (V) r Leu (L), lie <l) r Phe (F), Pro 
(P), and Met (M)* The four charged residue*, Asp (D), Glu (E) r Lys (K), and Arg 
(R), form the second class. The thud class comprises those with polar side 
chains; Scr (S), Thr (T), Cys (C), Asn (N) r Gin (Q), His (H), Tyr (Y), and Trp 
(W). The amino acid glycine (G), which has only a hydrogen atom as a side 
chain and so is the simplest of the ZU amino acids, has special properties and 
is usually considered either to form a fourth class or to belong to the first 
class. 

The four groups attached to the C* atom are chemically different for all 
the amino acids except glycine, where two II atoms bind to C a . All amino 
acids except glycine are thus chiral molecules that can exist in two different 
forms with differenl "hands," l- or p-form (Figure 1.3), 

Biological systems depend on specific detailed recognition of molecules 
lhat distinguish between chiral forms, The translation machinery for protein 
synthesis has evolved to utilize only one of the chiral forms of amino acids r 
the L-form.. All amino acids that occur In proteins therefore have the L-form, 
There is, however, no obvious reason why the L-form was chosen during evo- 
lution and not the r>£ofm< 



R 



L-form Deform 



CO 



Figure 1.3 The "handedness" of amino acids. 
Looking down the H-C* bond from the 
hydrogen atom, the L-form his CO, R, and 
Nf *ubstituents from C a going in a clockwise 
direction I here is a mnemonic to remember 
this; for ihe L-form the groups read CORN in 
clockwise direction. 



Cysteines can form disulfide bridges 

Two cysteine residues in different parts of the polypeptide chain but adjacent 
in the three-dimensional structure of a protein can be oxidized to form a 
disulfide bridge (Figure lA) r The disulfide is usually the end product of air 
oxidation according to the following reaction scheme: 

2 -CH 2 SH + Vfc 0 2 ^ -CH 2 -S^S-CH2 + H 2 0 
This reaction requires an oxidative environment, and such disulfide bridges 
arc usually not found in intracellular proteins, which spend their lifetime in 
an essentially reductive environment. Disulfide bridges do, however, occur 
quite frequently among extracellular proteins that are secreted from cells, 
and in euearyotes, formation of these bridges occurs within the lumen of the 
endoplasmic reticulum, the first compartment of the secretory pathway. 



cystme 



cysteine 




I'fgiir* IA Thv disulfide is usually the end 
product or air oxidation according to the 
t allowing schematic reaction scheme: 

2 -CH Z SH + l A Oi *± CHr-S-S-CHz + H z O 
Disulfide bonds form between the side chains 
cf iwo cysteine residues. Tivo SH groups from 
cysteine residues, which may Ijc in different 
farts of the amino add sequence but adjacent 
in the Lhrve-dimenslonaE structure, are 
tfxidteed io form one S-S (disulfide) group. 



5 



§| Ate, Alanine 



CH 3 



H,tT— c— axr 



t 

H 



M vol. Vd/me 



H 3 C ^H, 
CH 



H 

C 



'7 



IS* 



HC " 
II 

HC. 



CH 



.CH 



i Pfte, Phenylalanine 



C 

! 

CH 2 



CH, 



0 



H 2 C 



v 



9 

.A. 
'? 0 



V 



QH 

9 I 



Sen Serine 



t *• 

H ten, Uurint 



i 



o v o- 

CH 2 



<jH 3 



Glu, Gfntamfc acid 



^ f ? 



CH 



I Tftr, Threonine 



f 



rV 
t %A 



Tyr, Tyrosine 



1 
I 

CH, 

I 

CH 3 
i 

CH 3 



OH 




t r- 



9 



C 

CH 2 



c 

I 

CH 2 

I 

CH 2 



| Ofs, Cysteine 



j Asn, teparagine 



Cln, Gfotamine 



Folding and 
Flexibility 



A protein, as we have seen, is a polypeptide chain folded into one or more 
domains, each of which fs made up of (t helices, p sheets and loops. The 
process by which a polypeptide chain acquires its correct three-dimensional 
structure to achieve the biologically active native state is called protein fold- 
ing. Although some polypeptide chains spontaneously fold into the native 
state, others require the assistance of enzymes, for example, to catalyze the 
formation and exchange of disulfide bonds; and many require the assistance 
of a class of proteins called chaperones. Achaperone binds tn a partly folded 
polypeptide chain and prevents it from making illicit associations with other 
folded or partly folded proteins, hence the name chaperone. A chaperonc 
also promotes the folding of the polypeptide chain it holds. After a polypep- 
tide has acquired most of its correct secondary structure, with the ct-helices 
and jl-sheets formed, it has a looser tertiary structure than the native 
state and is said to be in the molten globular state. The compaction that is 
necessary to go from the molten globular state to the final native state occurs 
spontaneously, x 

Protein folding generates a particular three-dimensional structure from 
an essentially linear, one-dixttenstonal structure^! polypeptide chain with a 
particular sequence oi amino acid residues. How to predict the three-dimen- 
sional structure of a protein from its amino acid sequence is the major 
unsolved problem in structural molecular biology. If we hati a general solu- 
tion to the protein folding problem, it would be possible to write a comput- 
er program to simulate protein folding and generate the precise three-dimen- 
sional structure of any protein from its amino acid sequence. However, a 
general solution to the folding problem is still not in sight, even though the 
number Of proteins whose three-dimensional structure has been solved 
experimentally, in other words, the database Of known protein structures, is 
doubling every 2 years. 

A protein in its native state is not static The secondary structural 
elements of the domains as well as the entire domains continually undergo 
small movements in space, either fluctuations of individual atoms or collec- 
tive motions of groups of atoms. Furthermore, the functional activities of 
many proteins depend upon large conformational changes triggered by lig- 
and binding, hi this chapter, after discussing protein folding, we shall exam- 
ine some examples of functionally important conformational changes of 
proteins. 



unfolded folded ■ J igi«*c 6.1 A polypeptide chain is extended 




ctnd flexible In the unfolded, denature*! state 
* whereas it is globular and compact in the 
'.olde.d r native state. 



Globular proteins arc only ntutglttatty stable 

Iwery biochemist or molecular biologist who has worked with proteins 
known by experience that they arc unstable, Slight changes in pH or temper- 
ature can convert a solution of biologically active protein molecules In their 
native stale to a biologically inactive denatured state. The energy difference 
between these two states in physiological conditions is quite small, about 
5-15 kca!/moi r not much more than the energy contribution of *t single 
hydrogen bond, which is of the order of 2^5 kcal/mol, 

There are two major contributors to the energy difference between the 
folded and the denatured state: enthalpy and entropy Enthalpy derives from 
the energy of the noncovalent interactions within the polypeptide chain— 
the hydrophobic interactions, hydrogen bonds and ionic bonds. The cova- 
lent bonds within and between the amino acid residues in the polypeptide 
chain are the same in the native and denatured states, with the exceptions of 
disulfide bonds in those proteins where these form between cysteine 
residues, the noncovalent interactions on the other hand differ significantly 
between the two states. In the native state these interactions are maximized 
to produce a compact globular molecule with a tightly packed hydrophobic 
core whereas the denatured state is more open and the side chains are more 
loosely packed (Figure 6.1). These noncovalent interactions are therefore 
stronger and more frequent in the native state and hence their energy con- 
tribution, enthalpy, is much larger. The enthalpy difference between native 
and denatured states can reach several hundred kcal/inol. 

Entropy derives from the second law of thermodynamics which states 
that energy is required to create order/proteins in the native stale are highly 
ordered in one main conformation whereas the denatured state is highly dis- 
ordered, with the protein molecules in many different conformations. A typ- 
ical experimental preparation of unfolded protein (a solution in £ M guan- 
tilnhim chloride or 8 M urea) contains 10 15 -l0 2a protein molecules, each of 
which will have a unique conformation, in the absence of compensating fac- 
tors it would therefore be entropically much mare favorable for the protein 
to be in the disordered denatured state. The energy difference due to entropy 
between the native ordered state and the denatured state can also reach sev- 
eral hundred kcal/molc but in the opposite direction to the enthalpy differ- 
ence, The total energy difference between the native and the denatured state 
of 5-15 kcal/mol, which is called the free energy difference, is thus a differ- 
ence between two large numbers, the enthalpy difference and the entropy 
difference. The fact that this difference is very small is a severe complicating 
factor both for predictions of possible native states and for interpretation of 
factors responsible for the stability or instability of protein molecules, 
because our knowledge about the denatured state is very incomplete. 

We know much more about factors thai influence the stability of the 
native state, mainly from experiments using directed mutations in proteins 
of known three-dimensional structure. Such experiments have yielded 



90 



precise information about energy contributions to the stability of the native 
slate from close packing of hydrophobic side chains In the interior of the pro* 
tcin, and from the presence of disulfide bridges and interior hydrogen bonds 
and salt bridges, as well as from side chains that compensate the dipole 
moment of a helices (see Chapter 3). 

The marginal stability of the native state over the denatured state is 
biologically very important. Living cells need globular proteins in correct 
quantities at appropriate times, It is therefore as important to be able easily 
to degrade these proteins as it is to be able to synthesize them. Globular pro- 
teins in living cells usually have a rather rapid turnover and their native 
states have therefore evolved to be only marginally stable. Moreover, the cat- 
alytic activities of enzymes, and other important functions of proteins, gener- 
ally require some structural flexibility, which would be inconsistent with a 
rigidly stabilized structure. 

Kinetic factors are important for folding 

High resolution x-ray structure determinations of several hundred proteins 
have shown that in each case the specific sequence of a polypeptide chain 
appears to yield only a single, compact, biologically active fold in the native 
state. This fold generally has many substates with minor structural differ- 
ences between them, as will be discussed later In this chapter, but alt of these 
substates have the same general fold. Comparisons with structure determi- 
nations in solution by NMR show that the same fold also prevails in solution, 
in other words, under physiological conditions there appears to be one con- 
formation for a given amino add sequence that has a significantly lower free 
energy than any other. How is this folded state reached? 

Intuitively one might imagine that all protein molecules search through 
all possible conformations in a random fashion until they are frozen at the 
lowest energy in the conformation of native state, The biophysicist Cyrus 
Levinthai showed in 1968 by a simple calculation that this is impossible. 
Assume as a gross simplification that each peptide group has only three 
possible conformations, the allowed regions a, 0 and L in the Ramachandran 
diagram (see Figure 1.7), arid that it converts one conformation into another 
in the shortest possible time, one picosecond (1U" 12 seconds). A polypeptide 
chain of 150 residues would then have 3 l50 = 10 6K possible conformations. To 
search all these conformations would require 10^ years {10 s6 seconds)— an 
astronomical number compared with the actual folding time, which is 
between 0.1 and 1000 seconds both m vftwand m vitro. To occur on this short 
time scale, the folding process must be directed in some way through a kinet- 
ic pathway of unstable intermediates to escape sampling a large number of 
irrelevant conformations. 

Such a folding mechanism raises several important questions that are dif- 
ficult to examine experimentally, since the possible intermediates have a very 
short lifetime, If kinetic factors are important for the folding process it is 
possible that the observed folded conformation is not the one with the low- 
est free energy but rather the most stable of those conformations that are 
kinetically accessible. The protein might be kinetically trapped in a local low 
energy state with a high energy barrier that prevents it from reaching the 
global energy minimum which might have a different fold. In such a case 
structure prediction by energy calculations would give the wrong structure 
even if such calculations could be made with great accuracy One important 
question therefore is how a living cell can prevent the folding pathway from 
becoming blocked at an intermediate stage. The most common obstacles to 
correct folding seem to be (1) aggregation of the intermediates through 
exposed hydrophobic groups, (2) formation of incorrect disulfide bonds, and 
(3) isomerization of proline residues. To circumvent these three obstacles cells 
produce special proteins that assist the folding process, as we shall discuss 
later in this chapter. 




unfolded 



molten globule 



{aided 



JFi snrc 6.2 The molten globule state is an 
important Intermediate in the folding pathway 
w hen a polypeptide chain converts from an 
infolded to a folded state. The molten globule 



h is most of the secondary structure of the 
n itive state but It Is less compact and the 
p: optir packing interactions in the interior 
o Lhe protein have not been formed. 



An alternative way to remove kinetic barriers is exemplified by a-lytic 
protease, a bacterial enzyme which telongs to the serine protease superfami- 
)y of enzymes (Chapter 11), Like many other proteases it is synthesized and 
folded in vivo as an inactive precursor protein with a prosegment of 77 
residues, This segment is excised after folding to produce the active enzyme. 
Unfolded precursor protein refolds easily in vitro but unfolded a-lytic protease 
lacking the prosegment does not refold. However, a solution of unfolded 
enzyme can be induced to refold by adding the excised prosegment The 
capacity for folding obviously exists in the unfolded enzyme but there is 
a barrier present somewhere in the folding pathway that prevents folding, 
The prosegment removes this kinetic barrier, presumably by interacting with 
the enzyme in the unfolded state and thereby lowering the free energy of 
the transition states for folding; just as enzymes lower the free energy of 
transition states for chemical reactions and thereby increase the rates of lhe 
reactions (see Chapter 11). 

Molten globules are intermediates in folding 

The first observable event in the folding pathway of at least some proteins is 
a collapse of the flexible disordered unfolded polypeptide chain into a partly 
organized globular state, which is called the molten globule {Figure 6,2), 
This event Is fast, usually within the deadtime of the experimental observa- 
tion, which is a few milliseconds. We therefore know almost nothing about 
the process that leads to the molten globule, but we know some of the prop- 
erties of this state, l he molten globule has most of the secondary stnicture of 
the native stale and in some cases even native-like positions of the a helices 
and {i strands, it is less compact than the native structure and the proper 
packing interactions in the interior of the protein have not been formed. The 
interior side chains may be mobile, more closely resembling a liquid than the 
solid-like interior of the native state, Also loops and other elements of surface 
structure remain largely unfolded, with different conformations. The molten 
globule should, therefore, not be viewed as a single structural entity but 
as an ensemble of related stmctures that are rapidly mterconverting (see 
Figure 6.3a), 

In a second step, which can last up to 1 second, persistent naUve-Hke ele- 
ments of tertiary stnicture begin to develop, possibly in the form of subdo- 
mains that are not yet properly docked. The ensemble of conformations is 
much reduced compared with those of the molten globule but it is still far 
from a single form. The single native form is reached in the final stage of fold- 
ing, which involves the formation of native interactions throughout the pro* 
rein, including hydrophobic packing in the interior as well as the fixation of 
surface loops. 



92 



Burying hydrophobic side chains is a key event 

The collapse of the unfolded state to generate the molten globule embodies 
the main mystery of protein folding. What is the driving force behind the 
choice of native tertiary fold from a randomly oriented polypeptide chain? 

There is very little change in free energy by forming the internal hydro- 
gen bonds that are characteristic of a helices and 0 sheets because in (he 
unfolded state equally stable hydrogen bonds can he formed to water mole- 
cules. Secondary structure formation therefore cannot be the thermodynam- 
ic driving force of protein folding* On the other hand there is a large free 
energy change by bringing hydrophobic side chains out of contact with 
water and into contact with each other in the interior of a globular entity. 
Thus the most likely scenario is that the polypeptide chain begins to form a 
compact shape with hydrophobic Side chains at least partially buried very 
early in the folding process. This scenario has several important conse- 
quences. It vastly reduces the number of possible conformations that need to 
be searched because only those that are sterically accessible within this shape 
can be sampled* Second, when some of the side chains are partly buried, their 
polar backbone -NH and -CO groups are also buried in a hydrophobic envi- 
ronment unabie to form hydrogen bonds to water. This is energetically un- 
favorable unless they form hydrogen bonds bo each other, which they can 
only do if they are close together. Hie simplest way to form such bonds is by 
forming elements of secondary structure; « helices and £ sheets. The forma- 
tion of secondary structure early In the folding process can therefore be 
regarded as a consequence of burying hydrophobic side chains and not as a 
driving force for the formation of the molten globule. 

Looking at the amino add sequence of a globular protein one finds that 
hydrophobic side chains are usually scattered Along the entire sequence ip a 
seemingly random manner. In the native state of the folded protein about 
v half of these side chains are buried in the interior and the rest are scattered 
on the surface of the protein, surrounded by hydrophillc side chains. The 
buried hydrophobic side chains are not clustered in the sequence but are scat- 
tered along the entire polypeptide chain, What causes these residues to 
be selectively buried during the early and rapid formation of the molten glob- 
ule? This question must be answered before one can solve the folding 
problem and be able to predict the fold of a protein from its ammo acid 
sequence. 

Both single and multiple folding pathways have 
been observed * 

In order to understand fully any folding pathway, ail states of the pathway 
must be characterized both structurally and energetically. The simplified dia- 
gram in Figure 6.3 illustrates that during the folding process the protein pro- 
ceeds from a high energy unfolded state to a low energy native state through 
me last able intermediate states with local low energy minima separated by 
unstable transition states of higher energy. The characterization of these 
states is not trivial and many different experimental techniques are 
employed, including NMR, hydrogen exchange, spectroscopy and thermo- 
chemistry. 

Recently Alan Fcrsht, Cambridge University, has developed a protein 
engineering procedure for such studies. The technique is based on investiga- 
tion of the effects on the energetics of folding of single-site mutations in a 
protein of known structure. l : or example, if minimal mutations such as Ala 
to Uly in the solvent-exposed face of an a helix, destabilize both an inter- 
mediate state and the native state, as well as the transition state between 
thcm f it is likely that the helix is already fully formed in the intermediate 
state. If on the other hand the mutations destabilize the native state but do 
not affect the energy of the intermediate or transition states at all, it is likely 
that the helix is not formed until after the transition state. 



u 3 



Mi 
M 2 




[Figure 6-3 The unfolded state is an ensemble 
i a Urge number of confocmatianally 
4 Liferent molecules, U|..,U P( which undergo 
i apid intereonversions, The molten globule is 
;m ensemble of structurally related molecules, 
Mi*..M w# which are rapidly interconverting 
i ind which slowly change to a single unique 
. xmformation, the folded state F. During the 

biding process the protein proceeds from a 

ligh energy unfolded state to a low energy 
. laUve state, The conversion from the molten 
.{lobule state to the folded state is slow and 

lasses through a high energy transition 

;tate, T. 



! 
i 




Unfolded Molten Tmnsitmn 
globule state 




folding proem 



The small bacterial ribonuclease, barnase, Is a single chain piotein with 
1 10 ammo acids and no disulfide bridges. Its three-dimensional structure was 
determined by the group of Guy Dodson, York University, and comprises 
three amino terminal tx helices and a carboxy terminal five- stranded antipar- 
alle! p sheet (Figure 6.4). The group of Alan tersht have examined the effects 
of mutations all along the structure and have made a detailed residue by 
residue characterization of its folding intermediate and transition states. 
They have concluded from their results that the intermediate molten globule 
state already has not only most of the native secondary structure elements 
but also the native-like relative positions of the a helix and £ sheet as well as 




Figure 6,4 Schematic diagram of the structure 
of the enzyme barnase which Is folded into a 
five stranded anttparallel p sheet (blue) and 
two a helices (red). 



94 



(a) 






<2> 



u 



the relative positions of the p strands within the sheet. These results are 
consistent with the notion thai the folding of barnase proceeds through a 
single major transition state and consequently through one major pathway 
(Figure 6.5a). 

In contrast, folding of the enzyme lysozyrae involves parallel pathways 
and distinct folding domains- Hen egg-white lysoayme was the first enzyme 
to have its structure determined crystallographicaliy, in the laboratory of 
David Phillips then at the Royal Institution, London in 1965- The native 
structure consists of two lobes separated by a cleft (Figure 6*6). The first lobe 
comprises five a helices and the second is predominantly a three-stranded 
antiparallel (i sheet. The folding of lysozyme has been studied extensively by 
a variety of complementary techniques (NMR, circular dichroism, fluores- 
cence, hydrogen-deuterium exchange) to follow the development of differ- 
ent aspects of the structure such as formation of secondary structure, burial 
of hydrophobic aromatic groups and formation of hydrogen bonds. The 
group of Christopher Dohson, Oxford University, has used pulsed amide 
hydrogen-deuterium exchange to follow secondary structure formation. 
Amide hydrogen atoms are readily exchanged with the solvent In unfolded 
proteins, but this exchange is often strongly inhibited in a folded protein, 
especially for those amide groups that are hydrogen bonded in secondary 
structure elements. As a result, by measuring the rate of amide-hydrogen 
exchange as a function of folding time it is possible to monitor the formation 
of structure during the folding reaction. Ar 20 milliseconds, two major inter- 
mediate stages of lyso2ymc were detected: one in which the a-helical domain 



1 igure (a) Some proteins such as barnase 
f Md through one major pathway whereas 
< thers fold through multiple pathways, 
( >) The folding of the enzyme lysozyme 
r. roeeeds through at least two different 
f athways. 




V\ gtire 6.6 Schematic diagram of the; structure 
nl the enzyme lysozyme which folds Into two 
domains. One domain is essentially a-hehcal 
wiereas the second domain comprises a three 
st -cirrded iirjttparaliel £ sheet and two a hefices. 
There are three disulfide bonds (green), two in 
tfce a-helica! domain and one tn the xeeand 
domain. 



95 



